Voice conversion with smoothed GMM and MAP adaptation
نویسندگان
چکیده
In most state-of-the-art voice conversion systems, speech quality of converted utterances is still unsatisfactory. In this paper, STRAIGHT analysis-synthesis framework is used to improve the quality. A smoothed GMM and MAP adaptation is proposed for spectrum conversion to avoid the overly smooth phenomenon in the traditional GMM method. Since frames are processed independently, the GMM based transformation function may generate discontinuous features. Therefore, a time domain low pass filter is applied on the transformation function during the conversion phase. The results of listening evaluations show that the quality of the speech converted by the proposed method is significantly better than that by the traditional GMM method. Meanwhile, speaker identifiability of the converted voice reaches 75%, even when the difference between the source speaker and the target speaker is not very large.
منابع مشابه
A Voice Conversion Method Combining Segmental GMM Mapping with Target Frame Selection
In this paper, a voice conversion approach that combines two distinct ideas is proposed to improve the converted-voice quality. The first idea is to map spectral features, e.g. discrete cepstrum coefficients (DCC), with segmental Gaussian mixture models (GMMs). That is, a single GMM of a large number of mixture components is replaced here with several voice-content specific GMMs each consisting...
متن کاملAdaptive voice-quality control based on one-to-many eigenvoice conversion
This paper presents adaptive voice-quality control methods based on one-to-many eigenvoice conversion. To intuitively control the converted voice quality by manipulating a small number of control parameters, a multiple regression Gaussian mixture model (MR-GMM) has been proposed. The MR-GMM also allows us to estimate the optimum control parameters if target speech samples are available. However...
متن کاملEfficient Gaussian mixture model evaluation in voice conversion
Voice conversion refers to the adaptation of the characteristics of a source speaker's voice to those of a target speaker. Gaussian mixture models (GMM) have been found to be efficient in the voice conversion task. The GMM parameters are estimated from a training set with the goal to minimize the mean squared error (MSE) between the transformed and target vectors. Obviously, the quality of the ...
متن کاملHigh quality voice conversion based on Gaussian mixture model with dynamic frequency warping
In the voice conversion algorithm based on the Gaussian Mixture Model (GMM), quality of the converted speech is degraded because the converted spectrum is exceedingly smoothed. In this paper, we newly propose the GMM-based algorithm with the Dynamic Frequency Warping (DFW) to avoid the over-smoothing. We also propose that the converted spectrum is calculated by mixing the GMM-based converted sp...
متن کاملSpeaker adaptation of an acoustic-articulatory inversion model using cascaded Gaussian mixture regressions
The article presents a method for adapting a GMM-based acoustic-articulatory inversion model trained on a reference speaker to another speaker. The goal is to estimate the articulatory trajectories in the geometrical space of a reference speaker from the speech audio signal of another speaker. This method is developed in the context of a system of visual biofeedback, aimed at pronunciation trai...
متن کامل